Homework 5¶

Mareike Haaren¶

Question 1: Create a python file that webscrapes GDP by country and plots a stacked interactive bar plot using plotly¶

In [1]:
import requests as rq
import bs4
import pandas as pd
import plotly.express as px
import numpy as np
In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)'
page = rq.get(url)
## print out the first 200 characters just to see what it looks like
page.text[0 : 99]
Out[2]:
'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<titl'
In [3]:
bs4page = bs4.BeautifulSoup(page.text, 'html.parser')
tables = bs4page.find('table',{'class':"wikitable"})
In [4]:
GDP = pd.read_html(str(tables), header=[1])[0]
In [5]:
GDP.columns=["Country","Region","IMF_Estimate","IMF_Year","UN_Estimate","UN_Year","WB_Estimate","WB_Year"]
GDP
Out[5]:
Country Region IMF_Estimate IMF_Year UN_Estimate UN_Year WB_Estimate WB_Year
0 United States Americas 22939580.0 2021 20893746.0 2020 20936600.0 2020
1 China Asia 16862979.0 [n 2]2021 14722801.0 [n 3]2020 14722731.0 2020
2 Japan Asia 5103110.0 2021 5057759.0 2020 4975415.0 2020
3 Germany Europe 4230172.0 2021 3846414.0 2020 3806060.0 2020
4 United Kingdom Europe 3108416.0 2021 2764198.0 2020 2707744.0 2020
... ... ... ... ... ... ... ... ...
211 Kiribati Oceania 232.0 2021 181.0 2020 200.0 2020
212 Palau Oceania 208.0 2021 264.0 2020 268.0 2019
213 Nauru Oceania 133.0 2021 135.0 2020 118.0 2019
214 Montserrat Americas NaN NaN 68.0 2020 NaN NaN
215 Tuvalu Oceania 65.0 2021 55.0 2020 49.0 2020

216 rows × 8 columns

In [6]:
figIMF = px.bar(GDP, x = 'Region', y = 'IMF_Estimate', color = 'Country', title = "IMF GDP Estimates by Region")
figIMF.show()
In [7]:
figUN = px.bar(GDP, x = 'Region', y = 'UN_Estimate', color = 'Country', title = "UN GDP Estimates by Region")
figUN.show()
In [8]:
figWB = px.bar(GDP, x = 'Region', y = 'WB_Estimate', color = 'Country', title = "World Bank GDP Estimates by Region")
figWB.show()
In [ ]: